Modeling phone duration: application to Catalan TTS
نویسندگان
چکیده
There are many exhaustive works that deal with the use of models for segmental duration. The aim of this paper is to evaluate some of the properties mentioned in literature and evaluate factorial and sum-of-products models in front of a listlike approach for Catalan language as a base for a most exhaustive study on duration in this language. Sum-of-products models for vowels and subsystems of consonants seem to be more adequate to model phone duration. The parameters for the sum-of-products models are presented in the paper.
منابع مشابه
Catalan vowel duration
The temporal organization of discourse has produced a great deal of works in several languages pointing to different aims: from studies where the identification of cues about the planning of linguistic message is treated to studies in which duration models for text-to-speech systems are proposed. This work is a first step towards the description of Catalan vowel duration. Considering the Catala...
متن کاملAnalysis of Duration Prediction Accuracy in HMM-Based Speech Synthesis
Appropriate phoneme durations are essential for high quality speech synthesis. In hidden Markov model-based text-tospeech (HMM-TTS), durations are typically modeled statistically using state duration probability distributions and duration prediction for unseen contexts. Use of rich context features enables synthesis without high-level linguistic knowledge. In this paper we analyze the accuracy ...
متن کاملModeling vowel duration for Japanese text-to-speech synthesis
Accurate estimation of segmental durations is crucial for naturalsounding text-to-speech (TTS) synthesis. This paper presents a model of vowel duration used in the Bell Labs Japanese TTS system. We describe the constraints on vowel devoicing, and effects of factors such as phone identity, surrounding phone identities, accentuation, syllabic structure, and phrasal position on the duration of bot...
متن کاملModeling segmental durations for Japanese text-to-speech synthesis
Accurate estimation of segmental durations is crucial for naturalsounding text-to-speech (TTS) synthesis. This paper presents a model of segmental duration used in the Bell Labs Japanese TTS system. We describe the constraints on vowel devoicing, and effects of factors such as phone identity, surrounding phone identities, accentuation, syllabic structure, and phrasal position on the duration of...
متن کاملModeling and Synthesizing Emotional Speech for Catalan Text-to-Speech Synthesis
This paper describes an initial approach to emotional speech synthesis in Catalan based on a diphone concatenation TTS system. The main goal of this work is to develop a simple prosodic model for expressive synthesis. This model is obtained from an emotional speech collection artificially generated by means of a copy-prosody experiment. After validating the emotional content of this collection,...
متن کامل